AEP v2.0.0 — autonomy loop (G2–G7 + full_auto)#11
Merged
Conversation
Web research on loop engineering (5 building blocks, ReAct, Ralph loop) mapped against current AEP workflow. Scorecard + gap classification (G1 fresh-context, G2 recovery ladder, G4 post-merge guard, G5 telemetry reflect, G6 self-feeding discovery, G7 hygiene) with priority ordering. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
All 7 gap-fill methods (G1-G7) confirmed cross-host compatible via the executor abstraction. Resolved two caveats: G3 visual evaluator (Codex confirmed multimodal), G7 unifies on --max-turns (drop codex-only token_budget as primary). G1 standardizes on exec/headless one-shot per task to avoid nesting limits. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Spawn granularity in AEP is the story (one worker per story per round); deliberately not subdividing into per-task fresh contexts. G1 moved to a "Rejected" record with rationale; scorecard, gap buckets, priority, and compatibility tables updated. Gaps now G2-G7 (6 methods). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Post-deploy staging/prod validation with host-aware method selection: Claude Code auto-detects agent-browser; Codex uses native in-app browser+computer-use (desktop) or Playwright scripts (headless codex-exec, since computer-use is desktop-only). URL resolution = config first, CI fallback. Integration: upgrade Phase 6 + new post-deploy step. Issues auto-create stories via reflect classifier (links G6). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ood, telemetry reflect, self-feeding watch, visual eval, full-auto switch Implements the retained loop-engineering gaps (G2–G7) plus the A1 full-auto master switch, all defaulting to human-in-the-loop (opt-in only). - G2 recovery ladder: gen-eval/references/recovery-ladder.md; build Phase 5 and autopilot tick ④ climb same-fix → re-ground → fresh native-bg-subagent → decompose before the eval_not_converging human gate. - G4 host-aware dogfood + post-merge guard: executor/references/dogfood-validation.md (dogfood_method()/target_url(), Claude=agent-browser, Codex=native/Playwright), autopilot/references/post-merge-guard.md + tick Step ③.5; build Phase 6 host-aware; on-issue → reflect story; hard regression → conservative auto_revert (default off). - G5 telemetry reflect: reflect/references/telemetry-ingestion.md; reflect Step 1 auto-ingestion + Step 2.75 quantitative outcome auto-eval; tick layer-completion. - G6 self-feeding discovery: new /aep-watch skill (registered in marketplace.json). - G3 visual evaluator: Visual Design dimension in gen-eval scoring + evaluator contract. - G7 loop hygiene: unified --max-turns budget; cap = possibly-unsolvable. - A1 full_auto master switch (default false) gates strategic pauses; config keys added to product-context schema (all 3 templates). Quick-reference updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- B1: author telemetry-ingestion.md in canonical _shared/references/ (was created in a build-generated dir and wiped by build-skills.sh); rebuild materializes it into reflect/ + watch/ — G5 + watch ingestion now resolve. - S1: add guard_state entry to autopilot state-schema (post-merge-guard idempotency). - S3: register post_merge_regression in the escalation type enum. - S2: document recovery_rung in eval-protocol status.json fields. - S4: schema health_signals example ci → ci_status (matches the guard's key). - S5: skill count 16 → 17 in README + orientation; add /aep-watch to orientation table. - N1: brief Codex dogfood recipe pointer in codex-native.md. - oxfmt markdown reformatting. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…coverage guard)
Closes the v2 telemetry gap: consumers shipped without a way to decide/wire sources.
- Coverage rule + coverage_check() helper in telemetry-ingestion.md (canonical
_shared/references): a source is needed iff a quantitative success_metric or
health_signal requires it.
- /aep-map gains a Telemetry Binding step (the decision owner): bind each needed
signal to a detected/declared source via metric_map; flag the unmeasurable.
- /aep-scaffold audit detects the observability stack (Sentry/Datadog/PostHog/
OTel/health endpoint) → candidate telemetry_sources.
- /aep-watch (Step 0 precondition), /aep-reflect Step 2.75, and post-merge guard
run coverage_check() and BLOCK the auto path when the map binding is incomplete
("run /aep-map observability step") — never silently no-op.
- schema documents telemetry_sources[].metric_map + the coverage rule.
Folded into the unreleased v2.0.0 (PR #11). oxfmt + build-skills in sync.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Owner
Author
|
Added: telemetry source determination (commit 94c76a2) Closes the gap where v2's telemetry consumers (G5 reflect auto-eval, G6
Folded into the unreleased v2.0.0. Passed a focused design-review (no blockers; fixed a dangling |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the loop-engineering autonomy gaps from
docs/research/loop-engineering-autonomy-gap.md(G2–G7) plus the A1full_automaster switch. Builds on v1.8.0 (claude-team removed →native-bg-subagentdefault + post-spawn liveness probe). Every new capability defaults to human-in-the-loop; autonomy is opt-in viatopology.routingflags. Version bumped to 2.0.0.Research + design lineage is included in the branch (
docs/research/loop-engineering-autonomy-gap.md,docs/research/g4-dogfood-validation-design.md) and decisions were taken interactively (G1 rejected; full-auto kept opt-in; grouped_change kept as the one documented exception to one-subagent-per-story).What's in it
gen-eval/references/recovery-ladder.md) wired into build Phase 5 + autopilot tick ④ — same-fix → re-ground → fresh generator → decompose before the human gate.autopilot/references/post-merge-guard.md+ tick ③.5): deploy-health monitoring, conservativeauto_revert(default off).executor/references/dogfood-validation.md): Claude→agent-browser, Codex→native browser/computer-use or Playwright; config-first URL with CI fallback.reflect/references/telemetry-ingestion.md): auto-ingestion + quantitative outcome auto-eval./aep-watchskill — self-feeding work discovery (registered in marketplace.json).--max-turns.topology.routing.full_auto(default false) master switch over the strategic human gates.Safety posture
full_auto/auto_revert/auto_outcome_eval/watch.auto_createare explicit opt-ins.ghonly; no workspace-code reads, nogh pr mergefrom main).native-bg-subagent+ mandatory post-spawn liveness probe on every spawn path; one-launch=one-subagent=one-story invariant explicit (grouped_change is the documented exception).Process
Built via parallel sub-agents (new files + per-file wiring), then a design-review subagent pass; its findings (1 blocker + 5 should-fix + nits) are all addressed in
fix(aep-v2): address design review— notably authoringtelemetry-ingestion.mdin the canonical_shared/references/(the build-generated copy had been wiped), plus state-schemaguard_state/ escalation-enum /recovery_rungregistration and doc-count fixes.Verification
bash scripts/build-skills.sh --check→ in sync🤖 Generated with Claude Code